{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## One-Hot Encoding of Transaction Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One-hot encoder class for transaction data in Python lists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> from mlxtend.preprocessing import OnehotTransactions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Encodes database transaction data in form of a Python list of lists into a one-hot encoded NumPy integer array."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Suppose we have the following transaction data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from mlxtend.preprocessing import OnehotTransactions\n",
    "\n",
    "dataset = [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
    "           ['Apple', 'Beer', 'Rice'],\n",
    "           ['Apple', 'Beer'],\n",
    "           ['Apple', 'Bananas'],\n",
    "           ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
    "           ['Milk', 'Beer', 'Rice'],\n",
    "           ['Milk', 'Beer'],\n",
    "           ['Apple', 'Bananas']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using and `OnehotTransaction` object, we can transform this dataset into a one-hot encoded format suitable for typical machine learning APIs. Via the `fit` method, the `OnehotTransaction` encoder learns the unique labels in the dataset, and via the `transform` method, it transforms the input dataset (a Python list of lists) into a one-hot encoded NumPy integer array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 1, 1, 0, 1],\n",
       "       [1, 0, 1, 0, 0, 1],\n",
       "       [1, 0, 1, 0, 0, 0],\n",
       "       [1, 1, 0, 0, 0, 0],\n",
       "       [0, 0, 1, 1, 1, 1],\n",
       "       [0, 0, 1, 0, 1, 1],\n",
       "       [0, 0, 1, 0, 1, 0],\n",
       "       [1, 1, 0, 0, 0, 0]])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "oht = OnehotTransactions()\n",
    "oht_ary = oht.fit(dataset).transform(dataset)\n",
    "oht_ary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After fitting, the unique column names that correspond to the data array shown above can be accessed via the `columns_` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "oht.columns_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For our convenience, we can turn the one-hot encoded array into a pandas DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Apple</th>\n",
       "      <th>Bananas</th>\n",
       "      <th>Beer</th>\n",
       "      <th>Chicken</th>\n",
       "      <th>Milk</th>\n",
       "      <th>Rice</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Apple  Bananas  Beer  Chicken  Milk  Rice\n",
       "0      1        0     1        1     0     1\n",
       "1      1        0     1        0     0     1\n",
       "2      1        0     1        0     0     0\n",
       "3      1        1     0        0     0     0\n",
       "4      0        0     1        1     1     1\n",
       "5      0        0     1        0     1     1\n",
       "6      0        0     1        0     1     0\n",
       "7      1        1     0        0     0     0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "pd.DataFrame(oht_ary, columns=oht.columns_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we desire, we can turn the one-hot encoded array back into a transaction list of lists via the `inverse_transform` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['Apple', 'Beer', 'Chicken', 'Rice'],\n",
       " ['Apple', 'Beer', 'Rice'],\n",
       " ['Apple', 'Beer'],\n",
       " ['Apple', 'Bananas']]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first4 = oht_ary[:4]\n",
    "oht.inverse_transform(first4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## OnehotTransactions\n",
      "\n",
      "*OnehotTransactions()*\n",
      "\n",
      "One-hot encoder class for transaction data in Python lists\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "None\n",
      "\n",
      "**Attributes**\n",
      "\n",
      "columns_: list\n",
      "List of unique names in the `X` input list of lists\n",
      "\n",
      "### Methods\n",
      "\n",
      "<hr>\n",
      "\n",
      "*fit(X)*\n",
      "\n",
      "Learn unique column names from transaction DataFrame\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : list of lists\n",
      "\n",
      "    A python list of lists, where the outer list stores the\n",
      "    n transactions and the inner list stores the items in each\n",
      "    transaction.\n",
      "\n",
      "    For example,\n",
      "    [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Apple', 'Beer', 'Rice'],\n",
      "    ['Apple', 'Beer'],\n",
      "    ['Apple', 'Bananas'],\n",
      "    ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Milk', 'Beer', 'Rice'],\n",
      "    ['Milk', 'Beer'],\n",
      "    ['Apple', 'Bananas']]\n",
      "\n",
      "<hr>\n",
      "\n",
      "*fit_transform(X)*\n",
      "\n",
      "Fit a OnehotTransactions encoder and transform a dataset.\n",
      "\n",
      "<hr>\n",
      "\n",
      "*inverse_transform(onehot)*\n",
      "\n",
      "Transforms a one-hot encoded NumPy array back into transactions.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `onehot` : NumPy array [n_transactions, n_unique_items]\n",
      "\n",
      "    The NumPy one-hot encoded integer array of the input transactions,\n",
      "    where the columns represent the unique items found in the input\n",
      "    array in alphabetic order\n",
      "\n",
      "    For example,\n",
      "    ```\n",
      "    array([[1, 0, 1, 1, 0, 1],\n",
      "    [1, 0, 1, 0, 0, 1],\n",
      "    [1, 0, 1, 0, 0, 0],\n",
      "    [1, 1, 0, 0, 0, 0],\n",
      "    [0, 0, 1, 1, 1, 1],\n",
      "    [0, 0, 1, 0, 1, 1],\n",
      "    [0, 0, 1, 0, 1, 0],\n",
      "    [1, 1, 0, 0, 0, 0]])\n",
      "    ```\n",
      "    The corresponding column labels are available as self.columns_, e.g.,\n",
      "    ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `X` : list of lists\n",
      "\n",
      "    A python list of lists, where the outer list stores the\n",
      "    n transactions and the inner list stores the items in each\n",
      "    transaction.\n",
      "\n",
      "    For example,\n",
      "    ```\n",
      "    [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Apple', 'Beer', 'Rice'],\n",
      "    ['Apple', 'Beer'],\n",
      "    ['Apple', 'Bananas'],\n",
      "    ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Milk', 'Beer', 'Rice'],\n",
      "    ['Milk', 'Beer'],\n",
      "    ['Apple', 'Bananas']]\n",
      "    ```\n",
      "\n",
      "<hr>\n",
      "\n",
      "*transform(X)*\n",
      "\n",
      "Transform transactions into a one-hot encoded NumPy array.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : list of lists\n",
      "\n",
      "    A python list of lists, where the outer list stores the\n",
      "    n transactions and the inner list stores the items in each\n",
      "    transaction.\n",
      "\n",
      "    For example,\n",
      "    [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Apple', 'Beer', 'Rice'],\n",
      "    ['Apple', 'Beer'],\n",
      "    ['Apple', 'Bananas'],\n",
      "    ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Milk', 'Beer', 'Rice'],\n",
      "    ['Milk', 'Beer'],\n",
      "    ['Apple', 'Bananas']]\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `onehot` : NumPy array [n_transactions, n_unique_items]\n",
      "\n",
      "    The NumPy one-hot encoded integer array of the input transactions,\n",
      "    where the columns represent the unique items found in the input\n",
      "    array in alphabetic order\n",
      "\n",
      "    For example,\n",
      "    array([[1, 0, 1, 1, 0, 1],\n",
      "    [1, 0, 1, 0, 0, 1],\n",
      "    [1, 0, 1, 0, 0, 0],\n",
      "    [1, 1, 0, 0, 0, 0],\n",
      "    [0, 0, 1, 1, 1, 1],\n",
      "    [0, 0, 1, 0, 1, 1],\n",
      "    [0, 0, 1, 0, 1, 0],\n",
      "    [1, 1, 0, 0, 0, 0]])\n",
      "    The corresponding column labels are available as self.columns_, e.g.,\n",
      "    ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open('../../api_modules/mlxtend.preprocessing/OnehotTransactions.md', 'r') as f:\n",
    "    print(f.read())"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
